Picture for Bo Zhao

Bo Zhao

Afford-VLA: Action-Aligned Visual Planning via Internalized Affordance

Add code
May 22, 2026
Viaarxiv icon

StepAudio 2.5 Technical Report

Add code
May 22, 2026
Viaarxiv icon

AffectVerse: Emotional World Models for Multimodal Affective Computing

Add code
May 19, 2026
Viaarxiv icon

Evo-Depth: A Lightweight Depth-Enhanced Vision-Language-Action Model

Add code
May 14, 2026
Viaarxiv icon

Focusable Monocular Depth Estimation

Add code
May 12, 2026
Viaarxiv icon

AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition

Add code
Apr 14, 2026
Viaarxiv icon

SVC 2026: the Second Multimodal Deception Detection Challenge and the First Domain Generalized Remote Physiological Measurement Challenge

Add code
Apr 07, 2026
Viaarxiv icon

Firebolt-VL: Efficient Vision-Language Understanding with Cross-Modality Modulation

Add code
Apr 07, 2026
Viaarxiv icon

PhysNeXt: Next-Generation Dual-Branch Structured Attention Fusion Network for Remote Photoplethysmography Measurement

Add code
Mar 20, 2026
Viaarxiv icon

TexEditor: Structure-Preserving Text-Driven Texture Editing

Add code
Mar 19, 2026
Viaarxiv icon